229 research outputs found

    Parallel Construction of Wavelet Trees on Multicore Architectures

    Get PDF
    The wavelet tree has become a very useful data structure to efficiently represent and query large volumes of data in many different domains, from bioinformatics to geographic information systems. One problem with wavelet trees is their construction time. In this paper, we introduce two algorithms that reduce the time complexity of a wavelet tree's construction by taking advantage of nowadays ubiquitous multicore machines. Our first algorithm constructs all the levels of the wavelet in parallel in O(n)O(n) time and O(nlgσ+σlgn)O(n\lg\sigma + \sigma\lg n) bits of working space, where nn is the size of the input sequence and σ\sigma is the size of the alphabet. Our second algorithm constructs the wavelet tree in a domain-decomposition fashion, using our first algorithm in each segment, reaching O(lgn)O(\lg n) time and O(nlgσ+pσlgn/lgσ)O(n\lg\sigma + p\sigma\lg n/\lg\sigma) bits of extra space, where pp is the number of available cores. Both algorithms are practical and report good speedup for large real datasets.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

    Técnicas de indexación y recuperación de documentos utilizando referencias geográficas y textuales

    Get PDF
    [Resumen] Internet y la World Wide Web se han convertido en un enorme repositorio de información consultado diariamente por millones de usuarios. Además, otros repositorios de información, como las bases de datos documentales o las bibliotecas digitales, también han aumentado su popularidad considerablemente. Esto ha provocado que la recuperación de información se haya convertido en una de las áreas de investigación más importantes dentro de la informática. Aunque estos repositorios contienen información de distinta naturaleza, la información más habitual es de tipo textual. A menudo, en el texto de un documento se pueden encontrar referencias geográficas que permiten asignar a ese documento una zona del espacio en la cual es relevante. Los usuarios de los sistemas que enumerábamos demandan cada vez más servicios que les permitan situar la información recuperada en un mapa. Además, también está aumentando el interés en consultas que permitan recuperar documentos relevantes no sólo para un tema determinado sino también para una zona determinada. El desarrollo de arquitecturas de sistemas, estructuras de indexación y otros componentes que permitan satisfacer estas necesidades es el objetivo principal de una nueva área de investigación denominada recuperación de información geográfica (GIR). En esta tesis abordamos varios temas de interés en el área. En primer lugar, las estructuras de indexación que permiten recuperar documentos empleando tanto su ámbito textual como su ámbito espacial no tienen en cuenta la naturaleza jerárquica del espacio geográfico ni las relaciones topológicas entre los objetos espaciales que indexan. Por tanto, nuestro primer objetivo es desarrollar una estructura que solucione los problemas debidos a estas limitaciones. Esta estructura constituye la base de la arquitectura para sistemas GIR que proponemos como segundo objetivo de la tesis. Estudiamos las limitaciones de las arquitecturas de los sistemas GIR propuestas hasta la fecha y proponemos una arquitectura genérica, modular y extensible. Además desarrollamos un prototipo de sistema basado en dicha arquitectura. Finalmente, como tercer objetivo de esta tesis proponemos una estructura para indexar objetos geográficos optimizada para las características de la información que se maneja habitualmente en sistemas GIR

    When Edge Computing Meets Compact Data Structures

    Full text link
    Edge computing enables data processing and storage closer to where the data are created. Given the largely distributed compute environment and the significantly dispersed data distribution, there are increasing demands of data sharing and collaborative processing on the edge. Since data shuffling can dominate the overall execution time of collaborative processing jobs, considering the limited power supply and bandwidth resource in edge environments, it is crucial and valuable to reduce the communication overhead across edge devices. Compared with data compression, compact data structures (CDS) seem to be more suitable in this case, for the capability of allowing data to be queried, navigated, and manipulated directly in a compact form. However, the relevant work about applying CDS to edge computing generally focuses on the intuitive benefit from reduced data size, while few discussions about the challenges are given, not to mention empirical investigations into real-world edge use cases. This research highlights the challenges, opportunities, and potential scenarios of CDS implementation in edge computing. Driven by the use case of shuffling-intensive data analytics, we proposed a three-layer architecture for CDS-aided data processing and particularly studied the feasibility and efficiency of the CDS layer. We expect this research to foster conjoint research efforts on CDS-aided edge data analytics and to make wider practical impacts

    Los sistemas de información geográfica en turismo

    Get PDF
    [Resumo] A internet converteuse nun dos lugares máis populares para publicar e buscar case calquera tipo de información. En particular, a información turística gañou moita atención na rede durante os últimos anos, e non só a información sobre viaxes, recursos, lugares, museos ou monumentos, senón tamén sobre turismo cultural. Neste artigo presentamos as posibilidades que ofrecen os sistemas de información xeográfica (SIX) para a publicación de información turística e o acceso a ela, a través de interfaces coa capacidade de xerar mapas interactivos que presenten información asociada a cada elemento de interese que apareza neles. Ademais, describimos como caso de estudo a viaxe virtual que se nos propón na Biblioteca Virtual Galega (http://bvg.udc.es), un sistema accesible a través da web que, por medio de tecnoloxías SIX, permite acceder a calquera información turística ou cultural de Galicia de xeito sinxelo.[Resumen] Internet se ha convertido en uno de los lugares más populares para publicar y buscar casi cualquier tipo de información. En particular, la información turística ha ganado mucha atención en la red durante los últimos años, no sólo información sobre viajes, recursos, lugares, museos o monumentos, sino también sobre turismo cultural. En este artículo presentamos las posibilidades que ofrecen los Sistemas de Información Geográfica (SIG) en la publicación y acceso a información turística, a través de interfaces con capacidades de generación de mapas interactivos con información asociada a cada elemento de interés presentado en los mapas. Además, describimos como caso de estudio el Viaje Virtual de la Biblioteca Virtual Gallega (http://bvg.udc.es), un sistema accesible a través de la Web que, utilizando tecnologías SIG, permite acceder a cualquier información turística o cultural de Galicia de manera sencilla.[Abstract] The Internet has become one of the most popular places to publish and search for almost any type of information. In particular, tourist information has received much attention in the Internet over the past few years, not only information about travel, resources, places, museums or monuments, but also about cultural tourism. In this article we discuss the potential offered by Geographic Information Systems (GIS) in the publication of and access to tourist information, through interfaces capable of generating interactive maps with information associated with each element of interest shown in the maps. In addition, as a case study, we describe the Virtual Trip of the Galician Virtual Library (http://bvg.udc.es), an Internet-accessible system which makes it possible, using GIS technologies, to easily access any tourist or cultural information about Galicia

    Space-Efficient Representations of Raster Time Series

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Raster time series, a.k.a. temporal rasters, are collections of rasters covering the same region at consecutive timestamps. These data have been used in many different applications ranging from weather forecast systems to monitoring of forest degradation or soil contamination. Many different sensors are generating this type of data, which makes such analyses possible, but also challenges the technological capacity to store and retrieve the data. In this work, we propose a space-efficient representation of raster time series that is based on Compact Data Structures (CDS). Our method uses a strategy of snapshots and logs to represent the data, in which both components are represented using CDS. We study two variants of this strategy, one with regular sampling and another one based on a heuristic that determines at which timestamps should the snapshots be created to reduce the space redundancy. We perform a comprehensive experimental evaluation using real datasets. The results show that the proposed strategy is competitive in space with alternatives based on pure data compression, while providing much more efficient query times for different types of queries.The data used in this study were acquired as part of the mission of NASA’s Earth Science Division and archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC). Funding: CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidade from Xunta de Galicia”, supported in an 80% through ERDF Funds, ERDF Operational Programme Galicia 2014-2020, and the remaining 20% by “Secretaría Xeral de Universidades” (Grant ED431G 2019/01). This work was also supported by Xunta de Galicia/FEDER-UE under Grants [IG240.2020.1.185; IN852A 2018/14]; Ministerio de Ciencia, Innovación y Universidades under Grants [TIN2016-78011-C4-1-R; RTC-2017-5908-7; PID2019- 105221RB-C41/AEI/10.13039/501100011033]; ANID - Millennium Science Initiative Program - Code ICN17_002; Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo (CYTED) [Grant No. 519RT0579]Xunta de Galicia; ED431G 2019/01Xunta de Galicia; IG240.2020.1.185Xunta de Galicia; IN852A 2018/14Chile. Agencia Nacional de Investigación y Desarrollo; ICN17_00

    Privacy-enhancing distributed protocol for data aggregation based on blockchain and homomorphic encryption

    Get PDF
    The recent increase in reported incidents of security breaches compromising users' privacy call into question the current centralized model in which third-parties collect and control massive amounts of personal data. Blockchain has demonstrated that trusted and auditable computing is possible using a decentralized network of peers accompanied by a public ledger. Furthermore, Homomorphic Encryption (HE) guarantees confidentiality not only on the computation but also on the transmission, and storage processes. The synergy between Blockchain and HE is rapidly increasing in the computing environment. This research proposes a privacy-enhancing distributed and secure protocol for data aggregation backboned by Blockchain and HE technologies. Blockchain acts as a distributed ledger which facilitates efficient data aggregation through a Smart Contract. On the top, HE will be used for data encryption allowing private aggregation operations. The theoretical description, potential applications, a suggested implementation and a performance analysis are presented to validate the proposed solution.This work has been partially supported by the Basque Country Government under the ELKARTEK program, project TRUSTIND (KK- 2020/00054). It has also been partially supported by the H2020 TERMINET project (GA 957406)

    Boosting Perturbation-Based Iterative Algorithms to Compute the Median String

    Get PDF
    [Abstract] The most competitive heuristics for calculating the median string are those that use perturbation-based iterative algorithms. Given the complexity of this problem, which under many formulations is NP-hard, the computational cost involved in the exact solution is not affordable. In this work, the heuristic algorithms that solve this problem are addressed, emphasizing its initialization and the policy to order possible editing operations. Both factors have a significant weight in the solution of this problem. Initial string selection influences the algorithm’s speed of convergence, as does the criterion chosen to select the modification to be made in each iteration of the algorithm. To obtain the initial string, we use the median of a subset of the original dataset; to obtain this subset, we employ the Half Space Proximal (HSP) test to the median of the dataset. This test provides sufficient diversity within the members of the subset while at the same time fulfilling the centrality criterion. Similarly, we provide an analysis of the stop condition of the algorithm, improving its performance without substantially damaging the quality of the solution. To analyze the results of our experiments, we computed the execution time of each proposed modification of the algorithms, the number of computed editing distances, and the quality of the solution obtained. With these experiments, we empirically validated our proposal.This work was supported in part by the Comisión Nacional de Investigación Científica y Tecnológica - Programa de Formación de Capital Humano Avanzado (CONICYT-PCHA)/Doctorado Nacional/2014-63140074 through the Ph.D. Scholarship, in part by the European Union's Horizon 2020 under the Marie Sklodowska-Curie under Grant 690941, in part by the Millennium Institute for Foundational Research on Data (IMFD), and in part by the FONDECYT-CONICYT under Grant 1170497. The work of ÓSCAR PEDREIRA was supported in part by the Xunta de Galicia/FEDER-UE refs under Grant CSI ED431G/01 and Grant GRC: ED431C 2017/58, in part by the Office of the Vice President for Research and Postgraduate Studies of the Universidad Católica de Temuco, VIPUCT Project 2020EM-PS-08, and in part by the FEQUIP 2019-INRN-03 of the Universidad Católica de TemucoXunta de Galicia; ED431G/01Xunta de Galicia; ED431C 2017/58Chile. Comisión Nacional de Investigación Científica y Tecnológica; 2014-63140074Chile. Comisión Nacional de Investigación Científica y Tecnológica; 1170497Universidad Católica de Temuco (Chile); 2020EM-PS-08Universidad Católica de Temuco (Chile); 2019-INRN-0
    corecore